Estimating speech recognition error rate without acoustic test data

نویسندگان

  • Yonggang Deng
  • Milind Mahajan
  • Alex Acero
چکیده

We address the problem of estimating the word error rate (WER) of an automatic speech recognition (ASR) system without using acoustic test data. This is an important problem which is faced by the designers of new applications which use ASR. Quick estimate of WER early in the design cycle can be used to guide the decisions involving dialog strategy and grammar design. Our approach involves estimating the probability distribution of the word hypotheses produced by the underlying ASR system given the text test corpus. A critical component of this system is a phonemic confusion model which seeks to capture the errors made by ASR on the acoustic data at a phonemic level. We use a confusion model composed of probabilistic phoneme sequence conversion rules which are learned from phonemic transcription pairs obtained by leave-one-out decoding of the training set. We show reasonably close estimation of WER when applying the system to test sets from different domains.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Feature space normalization in adverse acoustic conditions

We study the effect of different feature space normalization techniques in adverse acoustic conditions. Recognition tests are reported for cepstral mean and variance normalization, histogram normalization, feature space rotation, and vocal tract length normalization on a German isolated word recognition task with large acoustic mismatch. The training data was recorded in clean office environmen...

متن کامل

Towards robustness to fast speech in ASR

Psychoacoustic studies show that human listeners are sensitive to speaking rate variations 10]. Automatic speech recognition (ASR) systems are even more aaected by the changes in rate, as double to quadruple word recognition error rates of average speakers have been observed for fast speakers on many ASR systems 6]. In our earlier work 5], we studied the causes of higher error and concluded tha...

متن کامل

Robust i-vector based adaptation of DNN acoustic model for speech recognition

In the past, conventional i-vectors based on a Universal Background Model (UBM) have been successfully used as input features to adapt a Deep Neural Network (DNN) Acoustic Model (AM) for Automatic Speech Recognition (ASR). In contrast, this paper introduces Hidden Markov Model (HMM) based ivectors that use HMM state alignment information from an ASR system for estimating i-vectors. Further, we ...

متن کامل

Can continuous speech recognizers handle isolated speech?

Continuous speech is far more natural and ecient than isolated speech for communication. However, for current state-of-the-art automatic speech recognition systems, isolated speech recognition (ISR) is far more accurate than continuous speech recognition (CSR). It is common practice in the speech research community to build CSR systems using only CS data. However, slowing of the speaking rate ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003